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Abstract 

In  this  paper  we  discuss  a number  of  recent  developments  in  the  practice  of  how  to 
compute  with  radial  basic  functions.  The  two  main  problems  addressed  are  how  to  develop 
fast  evaluation  schemes  for  radial  basic  functions,  and  how  to  efficiently  carry  out  the 
solution  of  the  interpolation  problem.  The  approach  is  to  mainly  describe  work  which 
has  involved  the  author  and  Professor  Rick  Beatson  as  contributors,  and  to  include  an 
idiosyncratic  selection  of  works  by  other  researchers  which  have  attracted  the  attention 
of  the  author. 

1 Introduction 

Research  into  radial  basic  functions  has  been  active  now  for  about  30  years.  The  basic 
setup  is  as  follows.  A function  xf  : 1R"  — > 1R,  which  we  refer  to  as  the  basic  function,  is 
specified.  A subspace  is  then  constructed  by  reference  to  points  ®i, . . . ,xm  in  IR”.  The 
members  of  this  subspace  all  have  the  form 

m 

s{x)  = 22/ai'\p{x  - Xi),  x E !Rn, 

2=1 

where  the  a\,...  ,am  are  real  numbers.  It  is  important  to  appreciate  at  the  outset  that 
throughout  this  paper,  and  indeed  in  most  of  the  papers  appearing  in  this  area,  the  un- 
derlying assumption  is  that  the  points  Xi, . . . ,xm  are  distinct.  One  of  the  most  common 
tasks  for  which  these  functions  are  used  is  interpolation.  A small  amount  of  research 
has  been  carried  out  where  the  points  at  which  an  interpolant  is  developed  are  arbitrary 
distinct  points  in  ]Rn,  but  by  far  the  majority  of  the  work  relates  to  interpolation  which 
is  carried  out  at  the  same  points  as  those  used  to  effect  the  translation.  Accordingly, 
data  di,...,dm  are  given  at  X\, . . . ,xm , and  we  require  that 

m 

dj  = s(xj)  = 'Y^aixf{xj  - x^,  j = 1, ...  ,m.  (1.1) 

i=l 

Two  immediate  observations  present  themselves.  Firstly,  at  the  present  level  of  generality 
there  is  absolutely  no  guarantee  that  the  Equations  (1.1)  will  have  a unique  solution. 
Secondly,  one  knows  from  the  work  of  Mairhuber  [14]  that  there  are  no  Haa.r  subspaces 
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of  significant  dimension  in  any  space  IR"  for  n > 2.  What  this  means  is  that  if  we  are  to 
construct  interpolation  problems  which  have  a unique  solution  for  each  location  of  the 
data  points  xi,  ■ . . , xm  and  for  each  choice  of  the  data  d\, . . . , dm,  then  the  subspace  used 
must  vary  as  the  interpolation  points  vary.  If  we  pause  for  a moment  and  consider  how  we 
might  in  some  sensible  and  orderly  way  vary  the  subspace  as  the  points  xi,...,xm  vary, 
then  using  simple  shifts  of  a single  basic  function  ip  is  one  of  the  most  natural  choices. 
It  is  very  common  to  work  with  a function  ip  which  is  a radial  function.  Thus  we  take  a 
function  (p  : 1R+  — > 1R  and  determine  ip  by  the  rule  ip(x)  = (p(\x\)  for  all  x G IR".  Note 
that  throughout  this  account,  the  symbol  | • ] will  stand  for  the  Euclidean  norm  in  IR". 
At  this  point  a common  inaccuracy  arises.  The  function  ip  can  be  correctly  referred  to  as 
a radial  basic  function.  However,  many  authors  give  this  appellation  to  the  function  (p, 
whose  radiality  is  of  no  consequence  whatsoever,  since  it  would  imply  that  <p  was  simply 
an  even  function  on  IR.  Since  (p  only  acts  on  1R+  the  idea  that  <p  can  be  radial  is  vacuous. 
Let  us  continue  in  this  spirit  of  criticism  a little  while  longer.  As  far  as  the  author  is 
aware,  only  two  people  in  the  world  would  refer  to  ip  as  a basic  function,  or  a radial  basic 
function.  All  other  authors  would  use  the  word  basis  in  place  of  basic.  There  are  very 
obvious  problems  with  this  terminology.  We  are  seeking  to  generate  subspaces  which  are 
suitable  for  interpolation.  Such  subspaces  will  naturally  have  the  same  dimension  as  the 
number  of  data,  and  the  functions  {ip(-  — xf)  : i = 1,. . . ,m}  should  form  a basis  for 
the  subspace.  The  use  of  the  word  basis  in  two  completely  different  senses  seems  to  the 
author  to  be  misleading  and  unhelpful,  whereas  use  of  the  word  basic  — a difference  of 
one  character  — eliminates  any  possibility  of  confusion,  and  avoids  the  use  of  the  word 
basis,  which  has  a very  specific  mathematical  meaning,  in  a context  where  its  meaning 
is  not  the  usual  mathematical  one. 

The  problem  about  whether  interpolation  is  possible  has  a highly  satisfactory  answer 
in  the  work  of  Micchelli  [15].  We  direct  the  reader  to  the  book  of  Cheney  and  Light  [10] 
for  a full  account  of  these  matters.  A couple  of  examples  will  be  helpful.  If  one  chooses 

m m 

s(x ) = 'Y^aiCpdx  — Xi\)  = aj  exp(— \x  — Xj|2),  x G IR", 
i= 1 i= 1 

or 

m m 

s(x)  = '^ai(p{\x  - Xi\)  = \x  - aq|,  x G H", 

i=l  i= 1 

then  the  resulting  interpolation  problem  is  uniquely  solvable  for  any  choice  of  xi, . . . , xrn 
and  for  any  data  d\, . . . , dm.  This  result  contrasts  very  strongly  with  the  case  for  poly- 
nomial interpolation,  where  the  data  points  X\, . ... , xm  have  to  be  constrained  not  to  lie 
on  an  algebraic  surface  of  appropriate  degree.  Indeed,  the  alternative  formulation  of  the 
above  result  for  the  second  example  is  quite  often  surprising  to  mathematicians  who  are 
uninitiated  in  the  theory  of  radial  basic  functions. 

Theorem  1.1  Let  x\,...  ,xm  be  distinct  points  in  lRn.  Then  the  matrix  ( — Xi\)  is 
invertible. 

Having  drawn  a clear  distinction  between  polynomial  approximation  and  approxim- 
ation by  (radial)  basic  functions  it  is  at  this  point  that  we  must  consider  having  some 


polynomial  ingredients  in  our  interpolant.  This  is  done  in  a very  standard  way  by  a 
process  we  call  augmentation  by  polynomials.  We  consider  interpolants  of  the  form 

m 

s(x)  - y>^(jx  - s.l)  +p(-'c)i  (x£lRn). 

i=l 

Here  p is  a polynomial  of  total  degree  at  most  k — 1.  We  still  wish  to  interpolate  to 
m pieces  of  information,  but  now  have  more  than  m parameters  to  determine  with 
this  information.  The  remaining  parameters  are  determined  via  the  ‘natural’  boundary 
conditions.  The  full  set  of  equations  is 

m 

dj  - s(xj)  = Y,ai<t>(\xj-Xi\),  j = 1, . . ,m 

i-\ 

m 

0 = ^2aiq(xi),  for  all  q £ 7Tfc_i(lRn). 

i=l 

Here  7T/j_i(lRn)  represents  the  space  of  polynomials  of  total  degree  k — 1 in  IR".  Two 
questions  present  themselves  pretty  quickly  from  this  additional  hypothesis.  Why  should 
polynomials  be  added  to  the  interpolant,  and  why  are  the  boundary  conditions  chosen  in 
this  particular  way?  In  some  sense  it  is  essential  that  we  allow  ourselves  the  possibility 
of  adding  polynomial  terms  to  some  of  the  interpolants,  as  we  shall  soon  see.  The  most 
important  example  of  a radial  basic  function  interpolant  which  has  a polynomial  part 
will  be  the  thin-plate  spline.  We  will  make  considerable  reference  to  this  interpolant  in 
IR2,  where  it  has  the  form 

m 

s(x)  = /C  ai\x  ~ Xi I2 1R  lx  ~ Xi I + ax  + b,  (x  £ ]R2). 

t= l 

Note  here  that  the  parameter  a is  a vector  with  two  entries,  as  is  x.  Thus  ax  stands 
for  the  dot  product  between  a and  x.  The  parameter  b is  a real  number.  The  natural 
boundary  conditions  take  the  form 

m m m 

0/i  — OiSj  — adi  — 0, 
i—1  i=l  i—  1 

where  x^  = ( Sf,tj ),  i = 1, . . . , m.  This  particular  interpolant  exhibits  a feature  common 
to  all  the  cases  where  augmentation  by  polynomials  is  either  necessary  or  desirable: 
the  degree  of  the  polynomial  added  is  very  low.  The  usual  choices  are  k = 0 (when 
no  polynomial  term  is  added),  k = 1 (when  the  term  is  a constant  polynomial)  and 
k = 2 (when  the  added  polynomial  is  linear).  It  is  now  no  longer  possible  to  carry 
out  interpolation  for  all  choices  of  the  points  X\, . . . , xm.  One  must  avoid  distributions 
of  these  points  which  lie  on  a zero  surface  of  the  corresponding  polynomial  subspace. 
In  the  explicit  case  we  considered  above  (thin-plate  splines),  the  very  mild  restriction 
needed  is  that  aq, . . . , xm  should  not  all  lie  on  a single  straight  line.  The  theory  developed 
by  Micchelli  [15]  includes  the  case  of  augmentation  by  polynomials. 

We  now  propose  to  take  a look  at  a very  simple  example  which  we  hope  will  give  the 


Computing  with  radial  basic  functions 


reader  a feel  for  some  of  the  ideas  and  concepts  we  have  introduced  so  far.  We  consider 

m 

s(x)  = 52atla:  - xi\  + b (x  e H). 

2=1 

Here  the  parameter  b is  a real  number,  and  the  natural  boundary  condition  gives  us 
i ai  = 0.  A unique  feature  of  the  univariate  case  is  that  we  can  order  the  interpolation 
points  xi  < X2  < ■ ■ ■ < xm.  Now  consider  the  function  s in  one  of  the  intervals  [xi,xi+ 1], 
i — 1, . . . , m — 1.  It  is  clear  that  in  such  an  interval  s is  simply  a linear  function.  The 
demand  that  s interpolates  the  data  at  xi,...,xm  means  that  s must  be  the  piecewise 
linear  interpolant  to  the  data  in  the  interval  [xi,xm\-  What  is  the  effect  of  the  ‘natural’ 
boundary  conditions?  In  the  interval  [xm,  oo)  we  can  write 

m m 

s(x ) = aj(x  - x^  + b = - ajXj  + b. 

i= 1 i— 1 

Thus  s is  constant  in  [xm,  oo).  A similar  calculation  reveals  that  s is  constant  in  (— oo,  x\\. 
Combining  all  these  observations  shows  that  s is  the  natural  linear  spline  interpolant 
to  the  data  at  xi,...,xm.  This  goes  some  way  to  explaining  why  the  word  ‘natural’  is 
appended  the  boundary  or  extra  conditions.  But  we  can  go  a little  further.  It  is  well 
known  that  the  natural  splines  satisfy  a variational  principle.  For  the  linear  spline,  if  we 
examine 

x = {f  eS'  :f  eL\  1R)}, 


then 


/OO  nOO 

(s')2  < / (/')2 

-oo  J — oo 


for  all  f e X which  also  interpolate  the  data.  This  variational  principle  is  very  useful  in 
developing  error  estimates,  and  we  shall  return  to  this  general  thread  of  ideas  later  in  this 
account.  However,  we  ought  to  observe  that  S'  is  the  space  of  tempered  distributions, 
and  that  the  first  derivative  is  to  be  taken  in  the  distributional  sense.  There  are  ways 
of  getting  round  this  distributional  approach  (see  Cheney  and  Light  [10]  for  an  example 
which  corresponds  closely  to  the  discussion  here),  but  it  does  give  the  most  succinct  de- 
scription, and  creates  the  technical  background  which  will  underpin  all  the  theory  which 
has  been  developed  in  this  area.  Notice  also  that  the  quantity  being  minimised  can  be 
used  to  specify  a seminorm  on  X simply  by  taking  the  square  root  of  the  integral.  This 
seminorm  has  as  kernel  7r0(IR),  which  is  precisely  the  polynomial  subspace  we  use  to  aug- 
ment the  original  radial  basic  function.  Something  very  fundamental  is  happening  here. 
Most  mathematicians  would  regard  this  seminorm  as  being  a measure  of  smoothness 
of  the  corresponding  function.  The  natural  linear  spline  therefore  interpolates  the  data, 
and  is  the  smoothest  interpolant  to  the  data  from  X in  the  sense  that  it  possesses  the 
smallest  derivative  in  the  L2-norm.  If  we  are  to  pursue  this  very  natural  idea  of  making 
higher  derivatives  of  s small,  then  we  will  naturally  develop  seminorms  with  polynomial 
kernels.  This  goes  a long  way  towards  explaining  the  need  for  augmentation. 


Finally  in  this  introduction,  we  want  to  discuss  briefly  the  uses  to  which  radial  basic 
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function  interpolation  is  put.  There  are  two  significant  feelings  about  interpolation  by 
these  functions.  Firstly,  it  is  thought  that  radial  basic  function  interpolation  is  very 
good  for  treating  scattered  data.  Loosely  speaking,  data  is  scattered  when  there  is  no 
possibility  of  determining  either  a natural  choice  of  coordinate  axes,  or  an  origin.  It 
is  at  the  opposite  end  of  the  spectrum  to  gridded  data.  In  the  presence  of  a cartesian 
product  for  the  data  sites,  it  is  much  more  efficient  to  use  univariate  methods  together 
with  tensor  product  constructions  to  do  the  interpolation.  Secondly,  radial  basic  function 
interpolation  is  thought  to  be  very  good  for  dealing  with  high  dimensional  data.  There  is 
some  evidence  from  the  realm  of  neural  networks  that  this  is  indeed  the  case,  but  we  will 
not  venture  into  the  area  of  high  dimensional  data  interpolation  in  this  paper.  Finally, 
many  of  the  data  sets  we  want  to  treat  have  very  large  numbers  of  data  sites  and  so  our 
aim  is  to  develop  methods  which  will  handle  10,000  to  1,000,000  data  sites  or  more. 

2 Computational  difficulties  and  fast  evaluation 

In  this  Section,  we  want  to  discuss  the  difficulties  that  arise  when  a large  radial  basic 
function  interpolation  problem  is  posed.  We  shall  also  deal  with  one  of  the  essential  tools 
for  overcoming  some  of  the  difficulties.  The  system  we  want  to  solve  has  the  form 


m 


s(Zj) 

= a*^(l xi  - ^1)  + Pfa)  (i  = !.  • 

2=1 

..,m) 

(2.1) 

m 

q(xi),  for  all  g € 7rfc_i(lRn). 

(2.2) 

. 1 

If  we  declare  a basis  for  7Tfc_i(lRn)  then  we  can  write  these  equations  in  matrix  form  as 

U ?)(:)-(!)■ 

Here  the  matrix  A has  entries  <j){\xj  - x,|)  and  is  m x m.  The  matrix  Q has  entries 
Pt(xj),  where  pi , . . . , pv  is  a basis  for  7rfe_1(lRn),  and  is  of  size  m x v.  Recall  from  our 
assumptions  that  only  low  degree  polynomials  are  used,  and  so  Q is  a long  thin  matrix. 
In  the  case  of  thin-plate  splines  in  1R2  it  would  have  size  m x 3.  However,  A is  a very 
large  matrix,  with  absolutely  no  sparsity.  In  fact,  for  thin-plate  splines,  the  matrix  A 
is  zero  on  the  diagonal,  and  has  large  off-diagonal  entries.  In  solving  a large  system  of 
linear  equations,  the  only  effective  strategy  is  to  use  an  iterative  solver.  Such  a solver 
will  involve  many  multiplications  of  the  matrix  A with  a vector  a,  and  the  full  nature 
of  A makes  this  a very  costly  process.  One  of  the  key  discoveries  in  this  area  was  the 
Beatson  and  Newsam  [8]  result  which  showed  how  fast  multipole  algorithms  could  be 
applied  to  this  area.  If  we  consider  the  expression 

m 

s(xj)  = ^2ai\xj  - Xi\2\n(\xj  - Xi\)  + p(xj) 

, i=l 

for  some  Xj  £ IR",  then  this  can  be  considered  as  an  evaluation  of  the  function  s at  the 
point  Xj,  or  the  formation  of  an  element  in  the  matrix  vector  product  Aa.  Because  of 
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this,  most  authors  tend  only  to  consider  how  to  evaluate  the  function  s in  an  efficient  way 
— generating  what  are  known  as  fast  evaluation  algorithms.  It  is  impossible  to  estimate 
properly  the  importance  of  this  discovery.  Anyone  involved  in  programming  iterative 
solutions  to  the  thin-plate- spline  equations  with  tens  of  thousands  of  points  would  find 
that  any  such  algorithm  would  just  grind  itself  into  the  dust  without  this  technology.  The 
technology  really  has  two  aspects:  a mathematical  tool,  and  a programming  structure. 
Here  we  intend  to  give  only  the  flavour  of  the  argument.  The  reader  who  really  wants  to 
know  the  details  is  advised  to  look  either  at  the  original  paper  [8] , or  the  later  paper  of 
Beatson  and  Light  [5]  which  deals  with  polyharmonic  splines.  She  can  also  look  at  two 
papers  which  give  clear  explanations  of  simple  cases.  The  first  is  found  in  a survey  paper 
by  Beatson  and  Greengard  [3].  The  second  is  a technical  report  by  Beatson,  Levesley 
and  Light  [7].  This  last  paper  discusses  fast  evaluation  methods  on  the  circle  and  higher 
dimensional  spheres,  and  the  reader  will  find  a very  careful  and  full  account  of  the 
one-dimensional  circle  case.  The  first  trick  with  problems  in  1Ri— 2  is  to  consider  complex 
variables,  rather  than  points  in  H2.  Let  2 be  a point  at  which  we  wish  to  evaluate  s,  and 
u a data  point,  or  centre.  Then 


| z — w|2  In  \z-u\  — 1ZE (\z  — u|2  ln(2  - u ))  = HE (\z  — u\2  In  2)  + 1ZE  ^2  — u\2  In  ^1  - 

Look  at  the  last  two  expressions  here.  The  first  of  them  has  the  centre  u in  the  square  of 
the  modulus  term,  and  this  expression  is  quite  cheap  to  evaluate,  even  if  there  are  many 
centres  u.  The  effect  of  many  centres  on  the  second  term  is  however  quite  profound,  and 
it  is  with  this  term  that  we  must  work.  The  idea  is  to  set  a tolerance,  and  only  aim  to 
evaluate  s to  within  this  tolerance,  rather  than  exactly.  The  appropriate  series  expansion 
can  then  be  used: 


-p 


p= 1 


p= 1 


p= 1 


The  value  of  N depends  on  the  tolerance  demanded  of  the  evaluation  and  the  relative 
sizes  of  u and  z.  For  this  reason,  we  think  of  2 as  far  away  from  the  origin  in  1R2,  and 
u close  to  the  origin.  If  there  are  now  many  centres  ui , . . . , um  near  the  origin,  and  2 is 
far  away  from  the  origin,  then  we  can  summarise  the  effects  of  linear  combinations  of 
all  these  centres  as  follows: 


m 

Y^ai\Z  ~ ui|2ln  (i  - ~) 


m N 

i—  1 p=  1 

Nm  N 

'<^2^2aifp(ui)z~P  = ^29p(ui,---,Um)z~P 

p—1  i— 1 p— 1 


The  principle  now  is  to  use  the  last  expression  above  to  make  an  approximate  evaluation 
of  s.  Of  course,  the  assumption  that  2 was  far  from  the  origin  and  u± , . . . , um  were 
close  to  the  origin  is  not  important.  It  is  simply  important  that  2 be  far  away  from  the 
cluster  of  centres  u\, . . . , urn . The  summarising  expression  is  referred  to  as  a Laurent  type 
expansion,  because  it  summarises  the  contribution  of  the  centres  u\, . . . , urn  in  terms  of 


Fig.  1.  Fast  evaluation  panelling. 


series  involving  negative  powers  of  z.  There  is  now  a lot  of  preprocessing  to  go  on  before 
the  fast  evaluation  algorithm  is  ready  to  roll.  Figure  1 shows  how  the  algorithm  proceeds. 
The  shaded  square  at  the  bottom  left  of  the  domain  is  the  point  which  contains  z,  the 
evaluation  point.  All  the  squares  around  this  one  which  are  the  same  size  are  deemed 
to  be  ‘close’  to  the  evaluation  square.  All  other  squares  are  ‘far  away’.  Of  course,  as  the 
squares  get  further  away  from  z it  becomes  possible  to  use  our  summarising  technique 
to  total  up  the  contributions  of  larger  and  larger  numbers  of  points.  This  is  done  in  a 
very  explicit  manner,  which  is  represented  by  the  shading  in  Figure  1.  As  we  get  further 
away,  we  double  the  size  of  the  squares  over  which  we  summarise,  and  there  is  a band  of 
same-size  squares  (or  a ring,  if  the  evaluation  square  was  in  the  middle  of  the  domain) 
two  squares  wide  surrounding  the  evaluation  square.  Once  all  the  preprocessing  is  done, 
and  we  shall  discuss  this  a little  more  in  a moment,  all  the  needed  coefficients  gp  are 
available,  and  evaluation  can  be  carried  out  in  about  O (log  rn)  FLOPS  instead  of  0(m). 

The  above  account  does  not  quite  reveal  the  whole  story.  The  coefficients  gp  are 
calculated  in  an  orderly  manner  which  greatly  improves  the  efficiency  of  the  algorithm. 
Suppose  our  problem  is  located  in  [0,  l]2.  An  initial  decision  is  made  to  divide  the 
original  domain  into  squares  of  size  2-n.  There  is  then  a parent  child  relationship  derived 
through  a quad-tree  data  structure.  The  parent  [0,  l]2  has  four  children:  [0, 0.5]2,  [0.5,  l]2, 
[0,0.5]  x [0.5,1]  and  [0.5,1]  x [0,0.5].  This  parent-child  relationship  helps  in  setting  up 
the  coefficients  gp(u i,. . . , urn)  in  an  efficient  way.  There  is  also  a further  idea  involving 
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Taylor  series,  which  gives  more  efficiency.  We  omit  any  description  of  this  technique. 

3 Inverting  the  interpolation  matrix 

Recall  as  at  the  beginning  of  the  previous  section  that  the  equations  specifiying  the 
interpolation  problem  are  as  follows: 

m 

s(xj)  = ^ai<t>(\xj  ~xi\)+P(xj), . (j  = 1, ...  ,m)  (3.1) 

4=1 
m 

for  all  q G 7rfc_j(]R”).  (3.2) 

2=1 

In  matrix  terms  we  have 


where  A is  a full  matrix  which  tends  to  exhibit  poor  conditioning.  The  poor  conditioning 
of  A is  similar  to  problems  experienced  by  researchers  in  the  theory  of  finite  elements  — 
as  the  interpolation  points  become  very  dense  in  a given  region,  the  conditioning  gets 
worse.  In  fact,  there  are  formal  statements  relating  some  impression  of  the  condition 
number  of  A (usually  the  smallest  eigenvalue  of  A)  to  the  minimum  interpoint  distance. 
The  following  table  represents  the  condition  number  of  A when  the  interpolation  points 
are  given  on  a uniform  5x5  grid  in  [0,  a)2.  Of  course,  on  a philosophical  level,  it  does  not 


Scale  parameter 

Condition 

a 

Number 

1.0 

3.6458  x 10* 

0.1 

2.5179  x 104 

0.01 

2.4364  x 106 

0.001 

2.4349  x 108 

Tab.  1 Two  norm  condition  numbers  of  A. 


make  any  sense  whatsoever  to  describe  an  interpolation  problem  as  being  ill-conditioned. 
Let’s  discuss  this  point  in  a little  more  depth.  Suppose  X\, . . . , xrn  are  points  in  ]R",  and 
G i , . . . , Gm  are  a set  of  functions  from  Mn  to  IR  which  are  linearly  independent  over 
{x\, . . . ,xm}.  That  is,  interpolation  to  arbitrary  data  at  aq, . . . ,xm  by  linear  combina- 
tions of  G i, . . . , Gm  is  always  uniquely  possible.  Then  there  is  a basis  , . . . , Fm  for  the 
linear  span  of  Gi, . . . , Gm  such  that  Fi(xj)  is  1 if  i = j and  is  zero  for  all  other  values  of 
i,  j between  1 and  m.  If  the  given  data  is  d\, . . . , dm,  then  the  interpolant  can  be  written 
down  immediately  as 

m 

YjdiFiix)  {x  G IR”). 

2=1 
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If  one  has  in  one’s  hands  the  basis  {Flt . . . , Fm } and  wants  to  know  the  coefficients 
which  must  be  used  then  one  need  only  invert  the  identity  matrix  to  obtain  the  solution, 
and  there  are  not  many  matrices  which  are  better  conditioned  than  the  identity  matrix! 
Of  course,  getting  one’s  hands  on  the  basis  Fi,.. . , Frn  is  usually  rather  difficult  — as 
hard  as  solving  the  original  problem  in  fact.  It  has  become  traditional  to  refer  to  the 
basis  Fi , . . . , Frn  as  the  Lagrange  basis  (in  sympathy  with  the  fact  that  Lagrange  was  a 
person  who  wrote  down  this  basis  for  polynomial  interpolation  in  one  dimension)  or  the 
cardinal  basis.  This  last  term  seems  to  the  author  to  be  quite  appropriate,  indicating 
that  the  basis  is  special.  However,  it  does  not  find  favour  with  spline  theorists,  since 
they  think  of  the  word  cardinal  in  a very  technical  sense  (the  interpolation  points  are 
TLn).  Terminology  aside,  the  point  is  still  made  that  the  conditioning  of  any  interpolation 
problem  is  a function  of  the  available  basis.  A more  practical  case  of  this  phenomenon 
is  the  problem  of  natural  cubic  splines  in  1R.  They  fit  into  the  radial  basic  function 
interpolation  scenario,  because  a natural  cubic  spline  with  knots  at  aq , . . . , xrn  can  be 
written  as 

m 

s(:c)  = y^a*la:  ~ £t|3  + ax  + b (x  E H). 

2=1 

If  we  require  this  spline  to  interpolate  data  d\ , . . . , dm  at  Xi , . . . , xm  then  we  have  to 
require  that  s(xj)  = dj  for  j = 1, . . . ,m.  The  natural  property  comes,  as  expected,  from 
the  natural  boundary  conditions: 

m m 

g{  = ajXj  = 0. 

2 = 1 2=1 

The  ill-conditioning  illustrated  in  Table  1 would  be  equally  present  in  this  example,  and 
the  remark  that  the  conditioning  increases  as  the  interpoint  spacing  decreases  would  also 
hold  good.  Of  course,  to  suggest  the  use  of  this  basis  to  a spline  practitioner  would  not 
be  a good  idea!  We  are  well  used  to  the  idea  that  B-splines  are  the  correct  basis  to  use 
in  this  situation. 

I suppose  the  two  principles  to  emerge  from  the  above  discussion  are  that  the  basis 
we  have  used  thus  far  to  describe  the  interpolation  problem  is  not  satisfactory  from  a 
computational  point  of  view,  and  that  in  at  least  some  of  the  cases  under  discussion 
(all  of  them  one-dimensional)  there  are  other  bases  which  are  superior.  There  are  other 
ways  to  conceptualise  the  difficulties  we  experience  with  the  radial  basic  functions.  Most 
of  them  tend  to  grow  at  infinity,  and  have  small  value  at  zero.  As  a general  principle, 
we  would  like  a basis  to  mimic  the  B-spline  basis.  That  is,  we  would  like  the  basis  to 
be  local  if  possible  — each  basis  function  having  a fairly  small  support  around  one  of 
the  interpolation  points.  The  first  people  to  make  progress  in  this  area  were  Dyn  and 
Levin  [11]  in  1983.  There  is  a later  paper  with  Rippa  [12]  in  1986  which  is  also  worth 
looking  at.  Their  technique  was  based  on  the  observation  that  if  F(x)  = |x|2  In  |j-|,  and 
x € IR2,  then  V4F  = 8n5.  Here,  V4  represents  the  bilaplacian,  and  5 is  the  Dirac  delta 
distribution  whose  action  on  each  rapidly  decreasing  function  in  S is  to  evaluate  it  at 
zero.  This  description  alone  should  alert  us  to  the  fact  that  V4F  = 8ttS  is  a distributional 
equation,  and  as  such  must  be  handled  with  care.  However,  numerical  analysts  dash  in 
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where  others  fear  to  tread,  and  we  can  approximate  the  Laplacian  as  follows: 

(V2F)(z)  » h-2{F{x-he1)+F(x+he1)+F{x-he2)+F{x+he2)-AF(x)}  (x  € 1R2). 

Here  h is  a real  parameter,  and  e\  and  e2  are  the  usual  unit  vectors  in  1R2.  Pictorially,  we 
can  represent  this  approximation  by  the  stencil  shown  in  Figure  2.  The  bilaplacian  stencil 


4 1 

-n • 


Fig.  2.  The  stencil  for  the  Laplacian. 

is  shown  in  Figure  3.  This  observation  is  used  in  a straightforward  way  if  the  interpolation 
points  lie  on  a grid.  Instead  of  using  the  thin-plate  spline  radial  basic  function  to  generate 
a basis,  one  uses  the  appropriate  linear  combinations  which  represent  the  bilaplacian  of 
this  function.  Because  one  has  a distributional  equation  relating  this  quantity  to  the  6 
function,  one  does  not  expect  to  get  the  5 function  exactly,  but  one  certainly  does  expect 
to  get  a function  which  decays  rapidly  at  oo,  and  this  is  exactly  what  happens.  Dyn  and 
Levin  provide  some  encouraging  numerical  results.  Of  course,  there  remains  the  problem 
of  what  to  do  when  the  data  is  not  gridded.  Here  one  must  develop  first  the  appropriate 
stencil  for  the  Laplacian  on  a point  by  point  basis.  This  may  seem  laborious,  but  in  fact 
the  next  few  methods  we  will  describe  all  compute  better  basis  elements  on  a point  by 
point  basis. 

Perhaps  the  most  successful  class  of  schemes  of  this  nature  — computing  a new 
basis  on  a point  by  point  approach  — comes  from  Beatson,  Goodsell  and  Powell  [2]  and 
Beatson,  Cherie  and  Mouat  [1],  Their  approach  is  perhaps  simpler  to  appreciate  and 
implement  than  that  of  Dyn  and  Levin.  They  begin  with  the  observations  I made  earlier 
— what  we  are  really  after  is  the  cardinal  basis  Fi, . . . , Frn  with  the  property  that  Fi(xj) 
is  1 if  i = j and  is  zero  for  all  other  values  of  i,  j between  1 and  m.  However,  because 
this  problem  is  as  difficult  to  solve  as  the  original  one,  we  proceed  as  follows.  Consider 
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Fig.  3.  The  stencil  for  the  bilaplacian. 


the  job  of  trying  to  construct  Fj.  This  function  is  supposed  to  be  1 at  X{  and  zero  at  all 
other  points.  Choose  about  50  near  neighbours  of  Xi,  say  yj  e {aq, . . . ,xm}.  This  choice 
must  include  X{.  Then  take 

50 

Fi(x)  = ^ a,j \x  - yj\ 2 In  \x  - yj | + bx  + c,  (x  e B2). 
j=i 

We  demand  that 

/ 1 

usb;  | o otherwise, 

and  that  the  natural  boundary  conditions  are  also  satisfied.  Thus  we  are  producing 
approximate  cardinal  functions  which  have  the  value  1 at  the  required  point,  but  are  only 
zero  on  about  50  neighbouring  points.  This  suggestion  is  based  on  the  fact,  observed  by 
many  workers,  that  such  functions  are  often  small  elsewhere  in  the  domain.  We  produce 
some  pictures  to  illustrate  this.  In  the  first  (Figure  4),  289  points  are  spaced  on  a regular 
grid  in  [0,  l]2.  The  approximate  cardinal  function  is  based  on  the  13  points  shown  in  bold 
in  Figure  4.  Figure  5 illustrates  the  same  situation,  but  now  as  shown  the  points  used  to 
develop  the  cardinal  function  are  all  clustered  in  one  corner  of  the  domain.  The  effect  is 
to  produce  significant  values  at  the  opposite  corner  of  the  domain.  One  can  infer  from 
this  that  whenever  the  data  is  pretty  much  uniformly  distributed,  the  cardinal  functions 
using  points  well  inside  the  domain  will  have  good  properties,  while  those  at  the  edge 
will  be  poor.  Similarly,  in  a non-uniform  distribution,  those  interior  to  a cloud  of  points 
will  behave  well,  while  those  at  cloud  boundaries  might  not. 

There  are  two  methods  for  dealing  with  the  difficulties  which  have  shown  up  above. 
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Fig.  4.  Approximate  cardinal  function  with  points  central  to  the  domain. 
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Fig.  5.  Approximate  cardinal  function  with  points  at  one  corner  of  the  domain. 


Firstly,  one  can  pin  all  the  cardinal  functions  at  a fixed  set  of  judiciously  chosen  points 
— so  that  every  cardinal  function  must  have  the  value  zero  at  these  points.  This  is  very 
effective  in  the  case  of  regularly  spaced  data  as  Figures  6 and  7 show.  One  can  imagine 
however,  that  a data  set  with  a number  of  clouds  might  benefit  from  a judicious  choice 
of  points  at  which  to  carry  out  the  pinning.  What  one  would  really  like  is  a method 
which  does  not  rely  on  any  user  intelligence  in  the  choice  of  points.  As  mentioned  before, 
a desirable  feature  of  a good  basis  function  is  one  which  decays  at  infinity.  This  decay 
should  be  at  some  rate  if  possible.  The  Beatson,  Cherie  and  Mouat  prescription  for  thin- 
plate  splines  in  1R2  is  that  the  elements  should  decay  like  |x[-3  as  \x\  — ► oo.  There  is  a 
problem  here,  in  that  if  we  opt  for  decay  elements  everywhere,  then  we  will  not  obtain 
a basis  for  our  space.  To  get  around  this  problem,  we  accept  an  element  F,  as  a decay 
element  if  it  satisfies 

50 

'y  1 \pi{yj) — i f m 
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Fig.  7.  Approximate  pinned  cardinal  function  with  points  at  one  corner  of  the  domain. 


and 

|Fj(a;)|  = (D(\x\~3)  as  |m|  — ► oo. 

Otherwise,  we  use  the  Fi  which  is  defined  by  the  previous  condit  ions  of  cardinality.  Again 
there  are  a few  bells  and  whistles  needed  to  make  this  method  operate  efficiently,  but 
we  hope  that  sufficient  detail  is  present  for  the  reader  to  be  able  to  see  the  general  idea. 
All  the  above  methods  are  providing  ways  of  constructing  a better  conditioned  basis 
with  which  to  solve  the  problem.  A method  still  has  to  be  selected  to  invert  the  matrix 
associated  with  the  new  basis,  which  is  now  much  better  conditioned  than  the  original 
matrix  corresponding  to  the  conventional  basis.  The  method  of  choice  for  most  authors 
is  some  version  of  GMRES. 

Beatson  called  the  points  at  which  decay  could  be  obtained  ‘good’  points,  and  points 
at  which  decay  could  not  be  obtained  ‘bad’  points.  This  idea  has  been  built  on  in  a 
recent  technical  report  by  Beatson  and  Levesley  [4].  The  general  spirit  is  to  define  good 
and  bad  points  in  the  same  way  as  Beatson,  and  then  to  develop  an  iterative  solver, 
solving  first  on  the  good,  then  the  bad,  then  returning  to  the  good  and  so  on. 


Computing  with  radial  basic  functions 

Finally,  a very  successful  method  has  recently  emerged  from  the  researches  of  Beatson, 
Light  and  Billings  [6].  This  method  has  the  advantage  that  it  is  a fast  iterative  solver 
which  may  be  regarded  as  a preconditioner  in  its  own  right  (thus  it  may  be  combined 
with  a solver  such  as  GMRES).  We  will  describe  it  here  as  a solver.  It  is  essentially 
the  domain  decomposition  method,  although  as  with  previous  solvers,  our  description 
will  be  very  much  at  a ‘bare  bones’  level,  and  the  interested  reader  is  referred  to  [6]  for 
the  fine  details,  which  include  some  error  estimates,  some  interesting  comments  on  an 
alternative  basis,  and  a good  deal  of  theory.  We  shall  describe  the  method  as  applied  to 
data  on  the  unit  square  [O',  l]2  in  St2,  and  we  will  not  make  any  attempt  to  make  the 
method  adaptive  in  character.  The  reader  will  be  able  to  see  these  improvements  for 
herself.  We  will  test  our  method  on  randomly  chosen  data  in  [0,  l]2. 

We  begin  with  a set  of  nodes  X = {xi,...,xm}  at  which  interpolation  is  to  be 
carried  out.  We  will  describe  the  algorithm  as  it  is  implemented  for  solving  the  thin- 
plate  spline  interpolation  problem  on  the  node  set  X.  We  divide  up  the  square  [0,  l]2 
into  a fairly  large  number  of  sub-domains  X\, . . . , X(.  There  are  two  constraints  on  these 
subdomains.  It  is  important  that  they  are  constructed  so  that  about  equal  numbers  of 
points  lie  in  each  subdomain  — about  50  points  per  subdomain  is  ideal.  Secondly,  it  is 
essential  that  each  subdomain  overlap  all  surrounding  subdomains.  In  our  terminology, 
two  subdomains  overlap  if  they  have  a (small)  number  of  points  in  common.  In  each 
subdomain  there  are  some  points  in  X which  lie  only  in  that  subdomain  and  not  in  any 
other.  We  call  these  points  the  inner  points  of  the  subdomain.  A coarse  set  Y of  inner 
points  in  the  node  set  X is  also  chosen.  We  will  say  more  about  this  coarse  set  in  a 
moment,  but  at  this  stage  it  simply  consists  of  a small  number  of  inner  points  from  each 
subdomain.  The  algorithm  will  then  construct  the  interpolant  s and  proceeds  as  follows. 
We  initialise  the  interpolant  s as  s = 0.  We  want  to  solve  the  equations 

m 

dj  = s(xj)  = ^ ai\xj  - Xi\2  In  \xj  - aq|  + axj  + (3,  (j  = 1, . . . , m)  (3.3) 

i— 1 

subject  to  the  boundary  conditions 

m ra  m K 

^ ^ Q-i  = ^ ^ O'iSi  = ^ ^ 0>iti  — 0,  (3.4) 

2=1  2=1  2=1 

where  Xi  = In  matrix  form  these  equations  are 

( QT  0 ) ( b ) = ( 0 ) ’ 

as  we  have  already  seen.  Our  method  will  operate  by  residual  correction,  so  we  begin  by 
setting 

r=(o)- 

It  is  important  to  recall  that  a is  a vector  of  length  2,  which  we  write  as  a = (cti , o^)- 
Suppose  now  we  have  begun  our  iterative  procedure  and  generated  an  approximation  s 
with  a residual  r.  The  next  few  steps  describe  how  to  update  the  approximation  and  the 


residual. 

Step  1.  We  construct  such  that  each  Sj  is  an  interpolant,  based  only  on  all 

points  of  the  subdomain  Xj,  using  as  data  the  residual  vector  r restricted  to  Xj. 

Step  2.  For  each  inner  point  x we  now  have  a single  real  number  ax  which  is  the 
coefficient  of  | • - x\ 2 In  | ■ — x|.  If  we  look  at  the  collection  of  coefficients  belonging  to  all 
the  inner  points  of  all  domains,  then  this  collection  is  not  in  general  orthogonal  to  tti  . 
That  is,  they  fail  to  satisfy  boundary  conditions  of  the  type  given  in  Equation  (3.4).  We 
now  correct  so  that  the  collection  of  coefficients  corresponding  to  all  inner  points  of  all 
domains  is  orthogonal  to  7Ti . 

Step  3.  We  set 


S\  = ^{a^l  • — x|2  In  | • — xj  : x is  an  inner  point}.  (3.5) 

Step  4.  We  evaluate  the  residual  TZ  = r-S\  at  the  coarse  grid  points,  and  then  construct 
the  interpolant  S 2 to  this  residual  on  the  coarse  grid  points  Y. 

Step  5.  We  update  s by  s <—  s + <Si  + S2-  The  new  residual  is  then  given  by 


where 


Zi  = di-s(xi),  i = 


This  iterative  process  can  either  be  continued  to  convergence,  or  used  as  a preconditioner 
followed  by  GMRES.  Table  2 shows  some  run  times  taken  to  obtain  an  error  of  less  than 
1 x 1(T6  for  the  Franke  1 function  (see  [13]  for  the  definition  of  this  function).  Random 
nodes  were  generated  in  [0,  l]2  and  an  Intel  Celeron  PC  was  used.  Recently, 


Number 
of  nodes 

Number  of 
iterations 

Time 

(seconds) 

10,000 

8 

7.0 

20,000 

8 

17.5 

40,000 

6 

35.5 

80,000 

6 

105.7 

160,000 

7 

407.8 

Tab.  2 Run  times  for  domain  decomposition. 


the  group  at  Leicester,  using  a twin  processor  Compaq  PC,  has  obtained  solutions  to 
a problem  with  1,000,000  random  points  in  leas  than  9 minutes,  and  we  can  safely  say 
that  the  combination  of  domain  decomposition  methods  and  multipole  fast  evaluation 
has  produced  a robust  and  effective  method.  Most  practitioners  will  be  aware  of  other 
ways  to  run  a domain  decomposition  algorithm.  In  particular,  one  can  use  a nesting 
approach  where  one  starts  with  only  four  subdomains  each  containing  large  numbers 
of  points.  To  solve  each  subdomain  problem,  one  subdivides  again  and  does  domain 
decomposition  in  the  subdomain. 
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